Scalable Multichannel Coding with HRTF Enhancement for DVD and Virtual Sound Systems *
نویسنده
چکیده
The purpose of this paper is to investigate how trans-aural processing can enhance conventional multichannel audio both by embedding perceptually relevant information and by improving image stability using additional loudspeakers integrated with supplementary digital processing and coding. The key objective is to achieve scala-bility in spatial performance while retaining full compatibility with conventional multichannel formats. This enables the system in its most basic form with unprocessed loudspeaker feeds to be used in a conventional multichannel installation. However, by appropriate signal processing additional loudspeaker feeds can be derived, together with the option of exploiting buried data to extract more signals in order to improve spatial resolution. The system is therefore hierarchical in terms of number of loudspeakers, channels, and ultimately spatial resolution, while in its simplest incarnation it remains fully compatible with the system configurations used with multichannel DVD-A and SACD replay equipment. The multichannel capabilities of DVD 1 technology [1], [2] were designed to enhance stereo 2 sound reproduction by offering surround image and improved envelopment capabilities. Normally multichannel audio encoded onto DVD assumes the ITU standard of a five-loudspeaker configuration driven by five discrete wide-band " loudspeaker feeds. " However, a limitation of this system is the lack of a methodology to synthesize virtual images capable of three-dimensional audio (that is, a perception of direction, distance, and height together with acoustic envelopment) rather than just " sound effects " often (although not exclusively) associated with surround sound in a home theater context. The ITU five-channel loudspeaker configuration can also be poor at side image localization, although this deficiency is closely allied to a sensitivity to room acoustics. Nevertheless, DVD formats still offer only six discrete channels, which if mapped directly into loudspeaker feeds remain deficient in terms of image precision , especially if height and depth information is to be encoded. The techniques described in this paper support scalable spatial audio that can remain compatible with conventional multichannel systems. It is shown that in this class of system, under anechoic conditions, signal processing can be used to match theoretically the ear signals to either a real or an equivalent spatially synthesized sound source. Also, in order to improve image robustness, directional sound-field encoding is retained as exploited in conventional surround sound to match the image synthesized through transaural processing. It may be argued that as the number of channels is increased, there is convergence toward wavefront synthesis [3], where …
منابع مشابه
Tutorial for ISIMP-2001 Recent Developments in Advanced Audio Processing
When DVD and home theater systems become more popular these days, high fidelity multichannel (5.1 channel or 10.2 channel) audio systems are well received in the market. Compared with the traditional mono or stereo audio, multichannel audio requires a much more efficient coding scheme for its storage and transmission. This talk will present two new multichannel audio coding techniques: (i) the ...
متن کاملProgressive Syntax-Rich Coding of Multichannel Audio Sources
Being able to transmit the audio bitstream progressively is a highly desirable property for network transmission.MPEG-4 version 2 audio supports fine grain bit rate scalability in the generic audio coder (GAC). It has a bit-sliced arithmetic coding (BSAC) tool, which provides scalability in the step of 1 Kbps per audio channel. There are also several other scalable audio coding methods, which h...
متن کاملVirtual Audio System Customization Using Visual Matching of Ear Parameters
Applications in the creation of virtual auditory spaces (VAS) and sonification require individualized head related transfer functions (HRTFs) for perceptual fidelity. HRTFs exhibit significant variation from person to person due to differences between their pinnae, and their body sizes. In this paper we propose and preliminarily implement a simple HRTF customization based on use of a recently p...
متن کاملTime and Frequency Decomposition of Head-Related Impulse Responses for the Development of Customizable Spatial Audio Models
This paper introduces a new approach to the decomposition of measured Head-Related Impulse Responses (HRIRs) based on simultaneous analysis in the time and frequency domains. This approach is computationally less demanding and faster than previous systematic approaches proposed for this purpose. Currently, HRIRs are the most usual representation of Head-Related Transfer Functions (HRTFs), which...
متن کامل